Integer Matrix Approximation and Data Mining

نویسندگان

  • Bo Dong
  • Matthew M. Lin
  • Haesun Park
چکیده

Integer datasets frequently appear in many applications in science and engineering. To analyze these datasets, we consider an integer matrix approximation technique that can preserve the original dataset characteristics. Because integers are discrete in nature, to the best of our knowledge, no previously proposed technique developed for real numbers can be successfully applied. In this study, we first conduct a thorough review of current algorithms that can solve integer least squares problems, and then we develop an alternative least square method based on an integer least squares estimation to obtain the integer approximation of the integer matrices. We discuss numerical applications for the approximation of randomly generated integer matrices as well as studies of association rule mining, cluster analysis, and pattern extraction. Our computed results suggest that our proposed method can calculate a more accurate solution for discrete datasets than other existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mathematical Programming in Machine Learning and Data Mining

The field of Machine Learning (ML) and Data Mining (DM) is focused around the following problem: Given a data domainD we want to approximate an unknown function y(x) on the given data setX ⊂ D (for which the values of y(x) may or may not be known) by a function f from a given class F so that the approximation generalizes in the best possible way on all of the (unseen) data x ∈ D. The approximat...

متن کامل

Calculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms

The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...

متن کامل

Reducing Dimensionality in Text Mining using Conjugate Gradients and Hybrid Cholesky Decomposition

Generally, data mining in larger datasets consists of certain limitations in identifying the relevant datasets for the given queries. The limitations include: lack of interaction in the required objective space, inability to handle the data sets or discrete variables in datasets, especially in the presence of missing variables and inability to classify the records as per the given query, and fi...

متن کامل

Integer Matrix Factorization and Its Application

Matrix factorization has been of fundamental importance in modern sciences and technology. This work investigates the notion of factorization with entries restricted to integers or binaries, , where the “integer” could be either the regular ordinal integers or just some nominal labels. Being discrete in nature, such a factorization or approximation cannot be accomplished by conventional techniq...

متن کامل

On parallelizing matrix multiplication by the column-row method

We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent” parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Sci. Comput.

دوره 75  شماره 

صفحات  -

تاریخ انتشار 2018